1 Introduction

Bayesian ANalysis of Differential Localisation Experiments (BANDLE) is an integrative semi-supervised functional mixture model, developed by Crook et al. (2021), to obtain the probability of a protein being differentially localised between two conditions.

In this vignette we walk users through how to install and use the R (R Development Core Team 2011) Bioconductor (Gentleman et al. 2004) bandle package by simulating a well-defined differential localisation experiment from spatial proteomics data from the pRolocdata package (Gatto et al. 2014).

The BANDLE method uses posterior Bayesian computations performed using Markov-chain Monte-Carlo (MCMC) and thus uncertainty estimates are available (Gilks, Richardson, and Spiegelhalter 1995). Throughout this vignette we use the term differentially localised to pertain to proteins which are assigned to different sub-cellular localisations between two conditions. One of the main outputs of BANDLE is the probability that a protein is differentially localised between two conditions.

2 The data

In this vignette and Crook et al. (2021), the main data source that we use to study differential protein sub-cellular localisation are data from high-throughput mass spectrometry-based experiments. The data from these types of experiments traditionally yield a matrix of measurements wherein we have, for example, PSMs, peptides or proteins along the rows, and samples/channels/fractions along the columns. The bandle package uses the MSnSet class as implemented in the Bioconductor MSnbase package and thus requires users to import and store their data as a MSnSet instance. For more details on how to create a MSnSet see the relevant vignettes in pRoloc. There is also additional information and examples in the pRoloc sister package. The pRolocdata experiment data package is a good starting place to look for test data. This data package contains tens of quantitative proteomics experiments, stored as MSnSets. In the next section we load some real data from this package as a use-case to demonstrate how to run the bandle package.

2.1 Example 1: spatialtemporal proteomic profiling of a THP-1 cell line

The data used in this vignette has been published in Mulvey et al. (2021) and is currently stored as MSnSet instances in the development version of pRolocdata package. For convenience the data is also stored in this package until it is available in the next stable Bioconductor release.

Briefly, Mulvey et al. (2021) performed triplicate hyperLOPIT experiments on THP-1 human leukamia cells where the samples were analysed and collected (1) when cells were unstimulated and then (2) following 12 hours stimulation with LPS (12h-LPS).

In the following code chunk we load 4 of the datasets from the study: 2 replicates of the unstimulated and 2 replicates of the 12h-LPS stimulated samples.

data("thpLOPIT_unstimulated_rep1_mulvey2021")
data("thpLOPIT_unstimulated_rep3_mulvey2021")
data("thpLOPIT_lps_rep1_mulvey2021")
data("thpLOPIT_lps_rep3_mulvey2021")

By typing the names of the datasets we get a dataMSnSet` data summary. For example,

thpLOPIT_unstimulated_rep1_mulvey2021
## MSnSet (storageMode: lockedEnvironment)
## assayData: 5107 features, 20 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: unstim_rep1_set1_126_cyto unstim_rep1_set1_127N_F1.4 ...
##     unstim_rep1_set2_131_F24 (20 total)
##   varLabels: Tag Treatment ... Fraction (5 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: A0AVT1 A0FGR8-2 ... Q9Y6Y8 (5107 total)
##   fvarLabels: Checked_unst.r1.s1 Confidence_unst.r1.s1 ... markers (107
##     total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  
## - - - Processing information - - -
## Loaded on Tue Jan 12 14:46:48 2021. 
## Normalised to sum of intensities. 
##  MSnbase version: 2.14.2
thpLOPIT_lps_rep1_mulvey2021
## MSnSet (storageMode: lockedEnvironment)
## assayData: 4879 features, 20 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: lps_rep1_set1_126_cyto lps_rep1_set1_127N_F1.4 ...
##     lps_rep1_set2_131_F25 (20 total)
##   varLabels: Tag Treatment ... Fraction (5 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: A0A0B4J2F0 A0AVT1 ... Q9Y6Y8 (4879 total)
##   fvarLabels: Checked_lps.r1.s1 Confidence_lps.r1.s1 ... markers (107
##     total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  
## - - - Processing information - - -
## Loaded on Tue Jan 12 14:46:57 2021. 
## Normalised to sum of intensities. 
##  MSnbase version: 2.14.2

We see that the datasets thpLOPIT_unstimulated_rep1_mulvey2021 and thpLOPIT_lps_rep1_mulvey2021 contain 5107 and 4879 proteins respectively, across 20 TMT channels. The data is accessed through different slots of the MSnSet (see str(thpLOPIT_unstimulated_rep1_mulvey2021) for all available slots). The 3 main slots which are used most frequently are those that contain the quantitation data, the features i.e. PSM/peptide/protein information and the sample information, and these can be accessed using the functions exprs, fData, and pData, respectively.

To run bandle there are a few minimal requirements that the data must fulfil. Data are required to have - the same number of channels across conditions and replicates - the same proteins across conditons and replicates - data must be a list of MSnSet instances

If we use the dim function we see that the datasets we have loaded have the same number of channels but a different number of proteins per experiment.

dim(thpLOPIT_unstimulated_rep1_mulvey2021)
## [1] 5107   20
dim(thpLOPIT_unstimulated_rep3_mulvey2021)
## [1] 5733   20
dim(thpLOPIT_lps_rep1_mulvey2021)
## [1] 4879   20
dim(thpLOPIT_lps_rep3_mulvey2021)
## [1] 5848   20

We use the function commonFeatureNames to extract proteins that are common across all replicates. This function has a nice side effect which is that it also wraps the data into a list, ready for input into bandle.

thplopit <- commonFeatureNames(c(thpLOPIT_unstimulated_rep1_mulvey2021,  ## unstimulated rep
                                 thpLOPIT_unstimulated_rep3_mulvey2021,  ## unstimulated rep
                                 thpLOPIT_lps_rep1_mulvey2021,           ## 12h-LPS rep
                                 thpLOPIT_lps_rep3_mulvey2021))          ## 12h-LPS rep
## 3727 features in common

We now have our list of MSnSets ready for bandle with 3727 proteins common across all 4 replicates/conditions.

thplopit
## Instance of class 'MSnSetList' containig 4 objects.

We can visualise the data using the plot2D function from pRoloc

## create a character vector of title names for the plots
plot_id <- c("Unstimulated 1st rep", "Unstimulated 2nd rep",
             "12h-LPS 1st rep", "12h-LPS 2nd rep")

## plot the data
par(mfrow = c(2,2))
for (i in seq(thplopit))
    plot2D(thplopit[[i]], main = plot_id[i])
addLegend(thplopit[[4]], where = "topleft", cex = .75)

3 Running the bandle function

The main function of the bandle package is bandle, this uses a complex model to analyse the data. Markov-Chain Monte-Carlo (MCMC) is used to sample the posterior distribution of parameters and latent variables from which statistics of interest can be computed. Here we only run a few iterations for brevity but typically one needs to run thousands of iterations to ensure convergence, as well as multiple parallel chains.

3.1 Fitting Gaussian processes

First, we need to fit non-parametric regression functions to the markers profiles, upon which we place our analysis. This uses Gaussian processes. We use the fitGPmaternPC function and the fitting uses some default penalised complexity priors (see ?fitGP), which work well. However, these can be altered, which is demonstrated in the next code chunk

par(mfrow = c(3,4))
gpParams <- lapply(thplopit, function(x) fitGPmaternPC(x))

We apply the fitGPmaternPC function on every MSnSet experiment by calling lapply over the thplopit list of data. The output of fitGPmaternPC returns a list of posterior predictive means and standard deviations. As well as MAP hyperparamters for the GP. As a side effect the posterior predictive distributions are overlayed with markers protein profiles for each subcellular class.

The prior needs to form a K*3 matrix (where K is the number of subcellular classes in the data), and 3 for (1) the prior, (2) length-scale amplitude and (3) standard deviation parameters (see hyppar in ?fitGP). Increasing these values, increases the shrinkage. For more details see the manuscript by Crook et al. (2021).

K <- length(getMarkerClasses(thplopit[[1]], fcol = "markers"))
pc_prior <- matrix(NA, ncol = 3, K)
pc_prior[seq.int(1:K), ] <- matrix(rep(c(10, 60, 250),
                                       each = K), ncol = 3)

Now we have generated these complexity priors we can pass them as an argument to the fitGPmaternPC function. For example,

par(mfrow = c(3,4))
gpParams <- lapply(thplopit,
                   function(x) fitGPmaternPC(x, hyppar = pc_prior))

By looking at the plot of posterior predictives we can see the GP fit looks sensible.

3.2 Setting the prior on the weights

The next step is to set up the matrix Dirichlet prior on the mixing weights. These weights are defined across datasets so these are slightly different to mixture weights in usual mixture models. The \((i,j)^{th}\) entry is the prior probability that a protein localises to organelle \(i\) in the control and \(j\) in the treatment. This mean that off-diagonal terms have a different interpretation to diagonal terms. Since we expect relocalisation to be rare, off-diagonal terms should be small. The following functions help set up the priors and how to interpret them. The parameter q allow us to check the prior probability that more than q differential localisations are expected.

set.seed(1)
dirPrior = diag(rep(1, K)) + matrix(0.0005, nrow = K, ncol = K)
predDirPrior <- prior_pred_dir(object = thplopit[[1]],
                               dirPrior = dirPrior,
                               q = 15)

The mean number of relocalisation is small:

predDirPrior$meannotAlloc
## [1] 0.2308647

The prior probability that more than q differential localisations are expected is small

predDirPrior$tailnotAlloc
## [1] 6e-04

The full prior predictive can be visualised as histogram. The prior probability that proteins are allocated to different components between datasets concentrates around 0.

hist(predDirPrior$priornotAlloc, col = getStockcol()[1])

3.3 The bandle function

We are now ready to run the main bandle function. Remember to carefully select the datasets and replicates that define the control and treatment. Here for convenience of building the vignette we only run 2 of the triplicates for each condition and run the bandle function for a small number of iterations to minimise the vignette build-time. Typically we’d recommend you run the number of iterations (numIter) in the \(1000\)s.

control <- list(thplopit[[1]], thplopit[[2]])
treatment <- list(thplopit[[3]], thplopit[[4]])

bandleres <- bandle(objectCond1 = control,
                    objectCond2 = treatment,
                    numIter = 50,      # usually 10,000
                    burnin = 5,        # usually 5,000
                    thin = 1,          # usually 20
                    gpParams = gpParams,
                    pcPrior = pc_prior,
                    numChains = 1,     # usually >=4
                    dirPrior = dirPrior)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  96%

The bandle function generates an object of class bandleParams. The show method indicates the number of parallel chains that were run, this should typically be greater than 4 (here we use 1 just as a demo).

bandleres
## Object of class "bandleParams"
## Method: bandle 
## Number of chains: 1

4 Analysing bandle output

Before we can begin to extract protein allocation information and a list of proteins which are differentially localised between conditions, we first need to populate the bandleres object by calling the bandleProcess function.

4.1 Populating a bandleres object

Currently, the summary slots of the bandleres object are empty. The summaries function accesses them.

summaries(bandleres)
## [[1]]
## An object of class "bandleSummary"
## Slot "posteriorEstimates":
## DataFrame with 0 rows and 0 columns
## 
## Slot "diagnostics":
## <0 x 0 matrix>
## 
## Slot "bandle.joint":
## <0 x 0 matrix>
## 
## 
## [[2]]
## An object of class "bandleSummary"
## Slot "posteriorEstimates":
## DataFrame with 0 rows and 0 columns
## 
## Slot "diagnostics":
## <0 x 0 matrix>
## 
## Slot "bandle.joint":
## <0 x 0 matrix>

These can be populated as follows

bandleres <- bandleProcess(bandleres)

These slot have now been populated

summary(summaries(bandleres))
##      Length Class         Mode
## [1,] 1      bandleSummary S4  
## [2,] 1      bandleSummary S4

The posteriorEstimates slot gives posterior quantities of interest for different proteins. The object is of length 2, 1 for control and 1 for treatment.

length(summaries(bandleres))
## [1] 2

4.2 Protein allocations

One quantity of interest is the protein allocations, which we can plot in a barplot. We create a new object from populating the bandleres objects using the summaries function.

par(mfrow = c(1, 2), oma = c(6,2,2,2))

pe1 <- summaries(bandleres)[[1]]@posteriorEstimates
pe2 <- summaries(bandleres)[[2]]@posteriorEstimates

barplot(table(pe1$bandle.allocation), col = getStockcol()[2], las = 2, main = "Protein allocation: control")
barplot(table(pe2$bandle.allocation), col = getStockcol()[2], las = 2, main = "Protein allocation: treatment")

The bar plot tells us for this case study that bandle has allocated the majority of unlabelled proteins to the nucleus (irrespective of the posterior probability).

4.3 Differential localisation probability

As previously mentioned the term “differentially localised” is used to pertain to proteins which are assigned to different sub-cellular localisations between two conditions. For the majority of users this is the output they are keen to extract using the BANDLE method.

Following on from the above example, after extracting posterior estimates for each condition using the summaries function we can also access the differential localisation probability as it is also stored here. The differential localisation probability tells us which proteins are most likely to differentially localise. The rank plot is a good visualisation. Indicating that most proteins are not differentially localised and there are a few confident differentially localised protiens of interest.

diffloc_probs <- pe1$bandle.differential.localisation
plot(diffloc_probs[order(diffloc_probs, decreasing = TRUE)],
     col = getStockcol()[3], pch = 19, ylab = "Probability", 
     xlab = "Rank", main = "Differential localisation rank plot")

If we take a 1% FDR and examine how many proteins get a differential probability greater than 0.99 we find there are 626 proteins above this threshold.

length(which(diffloc_probs[order(diffloc_probs, decreasing = TRUE)] > 0.99))
## [1] 625

4.4 Estimating the uncertainty

4.4.1 Applying the bootstrapdiffLocprob function

We can examine the top n proteins (here top = 100) and produce bootstrap estimates of the uncertainty (note here the uncertainty is likely to be underestimated as we did not produce many MCMC samples). These can be visualised as ranked boxplots

set.seed(1)
bt <- bootstrapdiffLocprob(params = bandleres, top = 100)
boxplot(t(bt), col = getStockcol()[5],
        las = 2, ylab = "Probability", ylim = c(0, 1),
        main = "Differential localisation \nprobability plot (top 100 proteins)")

4.5 Applying the binomDiffLoc function

In addition to the

4.6 Extracting probability estimates

5 Visualising differential localisation

We can visualise the changes in localisation between conditions on an alluvial plot using the plotTranslocations function

plotTranslocations(bandleres)

Or alternatively, on a chord (circos) diagram

plotTranslocations(bandleres, type = "chord")

6 Allocation probabilities

The allocation probabilities are stored in the tagm.joint slot. These could be visualised in a heatmap

bjoint_control <- summaries(bandleres)[[1]]@bandle.joint
pheatmap(bjoint_control, cluster_cols = FALSE, color = viridis(n = 25))

bjoint_treatment <- summaries(bandleres)[[2]]@bandle.joint
pheatmap(bjoint_treatment, cluster_cols = FALSE, color = viridis(n = 25))

–> –> –> –>

7 Session information

All software and respective versions used to produce this document are listed below.

sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] viridis_0.6.0        viridisLite_0.4.0    pheatmap_1.0.12     
##  [4] pRolocdata_1.29.1    ggplot2_3.3.3        bandle_1.0          
##  [7] pRoloc_1.30.0        BiocParallel_1.24.1  MLInterfaces_1.70.0 
## [10] cluster_2.1.1        annotate_1.68.0      XML_3.99-0.6        
## [13] AnnotationDbi_1.52.0 IRanges_2.24.1       MSnbase_2.16.1      
## [16] ProtGenerics_1.22.0  mzR_2.24.1           Rcpp_1.0.7          
## [19] Biobase_2.50.0       S4Vectors_0.28.1     BiocGenerics_0.36.0 
## [22] BiocStyle_2.18.1    
## 
## loaded via a namespace (and not attached):
##   [1] circlize_0.4.12       BiocFileCache_1.14.0  plyr_1.8.6           
##   [4] splines_4.0.5         digest_0.6.27         foreach_1.5.1        
##   [7] htmltools_0.5.1.1     magick_2.7.3          gdata_2.18.0         
##  [10] ggalluvial_0.12.3     fansi_0.4.2           magrittr_2.0.1       
##  [13] memoise_2.0.0         doParallel_1.0.16     mixtools_1.2.0       
##  [16] limma_3.46.0          recipes_0.1.15        gower_0.2.2          
##  [19] askpass_1.1           lpSolve_5.6.15        prettyunits_1.1.1    
##  [22] colorspace_2.0-0      ggrepel_0.9.1         blob_1.2.1           
##  [25] rappdirs_0.3.3        xfun_0.22             dplyr_1.0.5          
##  [28] crayon_1.4.1          jsonlite_1.7.2        hexbin_1.28.2        
##  [31] impute_1.64.0         survival_3.2-10       iterators_1.0.13     
##  [34] glue_1.4.2            gtable_0.3.0          ipred_0.9-11         
##  [37] zlibbioc_1.36.0       kernlab_0.9-29        shape_1.4.5          
##  [40] scales_1.1.1          vsn_3.58.0            mvtnorm_1.1-1        
##  [43] DBI_1.1.1             xtable_1.8-4          progress_1.2.2       
##  [46] bit_4.0.4             proxy_0.4-25          mclust_5.4.7         
##  [49] preprocessCore_1.52.1 lbfgs_1.2.1           lava_1.6.9           
##  [52] prodlim_2019.11.13    sampling_2.9          httr_1.4.2           
##  [55] FNN_1.1.3             RColorBrewer_1.1-2    ellipsis_0.3.1       
##  [58] farver_2.1.0          pkgconfig_2.0.3       nnet_7.3-15          
##  [61] sass_0.3.1            dbplyr_2.1.1          utf8_1.2.1           
##  [64] caret_6.0-86          tidyselect_1.1.0      rlang_0.4.10         
##  [67] reshape2_1.4.4        munsell_0.5.0         tools_4.0.5          
##  [70] LaplacesDemon_16.1.4  cachem_1.0.4          generics_0.1.0       
##  [73] RSQLite_2.2.6         evaluate_0.14         stringr_1.4.0        
##  [76] fastmap_1.1.0         mzID_1.28.0           yaml_2.2.1           
##  [79] ModelMetrics_1.2.2.2  knitr_1.33            bit64_4.0.5          
##  [82] purrr_0.3.4           randomForest_4.6-14   dendextend_1.14.0    
##  [85] ncdf4_1.17            nlme_3.1-152          xml2_1.3.2           
##  [88] biomaRt_2.46.3        compiler_4.0.5        curl_4.3             
##  [91] e1071_1.7-6           affyio_1.60.0         tibble_3.1.0         
##  [94] bslib_0.2.4           stringi_1.5.3         highr_0.8            
##  [97] lattice_0.20-41       Matrix_1.3-2          vctrs_0.3.7          
## [100] pillar_1.6.0          lifecycle_1.0.0       BiocManager_1.30.12  
## [103] GlobalOptions_0.1.2   jquerylib_0.1.3       MALDIquant_1.19.3    
## [106] data.table_1.14.0     R6_2.5.0              pcaMethods_1.82.0    
## [109] affy_1.68.0           bookdown_0.21         gridExtra_2.3        
## [112] codetools_0.2-18      MASS_7.3-53.1         gtools_3.8.2         
## [115] assertthat_0.2.1      openssl_1.4.3         withr_2.4.1          
## [118] hms_1.0.0             grid_4.0.5            rpart_4.1-15         
## [121] timeDate_3043.102     tidyr_1.1.3           coda_0.19-4          
## [124] class_7.3-18          rmarkdown_2.7         segmented_1.3-3      
## [127] pROC_1.17.0.1         lubridate_1.7.10

References

Crook, Oliver M, Colin TR Davies, Laurent Gatto, Paul DW Kirk, and Kathryn S Lilley. 2021. “Inferring Differential Subcellular Localisation in Comparative Spatial Proteomics Using BANDLE.” bioRxiv.
Gatto, Laurent, Lisa M. Breckels, Samuel Wieczorek, Thomas Burger, and Kathryn S. Lilley. 2014. “Mass-Spectrometry Based Spatial Proteomics Data Analysis Using pRoloc and pRolocdata.” Bioinformatics.
Gentleman, Robert C., Vincent J. Carey, Douglas M. Bates, Ben Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, et al. 2004. “Bioconductor: Open Software Development for Computational Biology and Bioinformatics.” Genome Biol 5 (10): –80. https://doi.org/10.1186/gb-2004-5-10-r80.
Gilks, Walter R, Sylvia Richardson, and David Spiegelhalter. 1995. Markov Chain Monte Carlo in Practice. CRC press.
Mulvey, Claire M., Lisa M. Breckels, Oliver M. Crook, David J. Sanders, Andre L. R. Ribeiro, Aikaterini Geladaki, Andy Christoforou, et al. 2021. “Spatiotemporal Proteomic Profiling of the Pro-Inflammatory Response to Lipopolysaccharide in the THP-1 Human Leukaemia Cell Line.” Nature Communications 12 (1). https://doi.org/10.1038/s41467-021-26000-9.
R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/.